Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy
نویسندگان
چکیده
The wide adoption of smart devices and Internet-of-Things (IoT) sensors has led to massive growth in data generation at the edge Internet over past decade. Intelligent real-time analysis such a high volume data, particularly leveraging highly accurate deep learning (DL) models, often requires be processed as close sources (or Internet) minimize network processing latency. advent specialized, low-cost, power-efficient greatly facilitated DL inference tasks edge. However, limited research been done improve throughput (e.g., number inferences per second) by exploiting various system techniques. This study investigates techniques, batched inferencing, AI multi-tenancy, cluster accelerators, which can significantly enhance overall on with models for image classification tasks. In particular, multi-tenancy enables collective utilization devices’ resources (CPU, GPU) accelerators Edge Tensor Processing Units; EdgeTPUs). evaluation results show that inferencing more than 2.4× improvement equipped high-performance GPUs like Jetson Xavier NX. Moreover, approaches, e.g., concurrent model executions (CME) dynamic placements (DMP), (with GPUs) EdgeTPU further improved up 3× 10×, respectively. Furthermore, we present detailed hardware software factors change EdgeTPUs, thereby shedding light areas could achieve
منابع مشابه
ZNNi - Maximizing the Inference Throughput of 3D Convolutional Networks on Multi-Core CPUs and GPUs
Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation, and object detection and localization. Here we consider the problem of inference, the application of a previously trained ConvNet, with emphasis on 3D images. Our goal is to maximize throughput, defined as average number of output voxels computed per unit time....
متن کاملLearning Deep Architectures for AI
Theoretical results suggest that in order to learn the kind of complicated functions that can represent highlevel abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-...
متن کاملParaDrop: Enabling Lightweight Multi-tenancy at the Network's Extreme Edge
We introduce, Paradrop, a specific edge computing platform that provides (modest) computing and storage resources at the “extreme” edge of the network allowing third-party developers to flexibly create new types of services. This extreme edge of the network is the WiFi Access Point (AP) or the wireless gateway through which all end-device traffic (personal devices, sensors, etc.) pass through. ...
متن کاملMaximizing the Throughput of Cuckoo Hashing in Network Devices
Hash tables form a core component of networkdevices. Because of their large size, they are implemented usingboth fast on-chip SRAM and slow off-chip DRAM. However, thismakes their implementation particularly delicate, as a suboptimalchoice of the hashing scheme parameters may result in a higheraverage query time, and therefore in a lower throughput. Sincehash tables are ...
متن کاملDeep Learning for Causal Inference
In this paper, we propose the use of deep learning techniques in econometrics, specifically for causal inference and for estimating individual as well as average treatment effects. The contribution of this paper is twofold: 1.For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Internet Technology
سال: 2023
ISSN: ['1533-5399', '1557-6051']
DOI: https://doi.org/10.1145/3546192